A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition
نویسندگان
چکیده
Nonnative speech recognition is becoming more and more important as many speech applications are deployed world wide. Meanwhile, due to the large population of nonnative speakers, speaker adaptation remains the most practical way for providing high performance speech services. Subspace Gaussian Mixture Model (SGMM) has recently been shown to yield superior performance on various native speech recognition tasks. In this paper, we investigated different speaker adaptation techniques of SGMM for nonnative speech recognition. A two-stage direct model adaptation approach has been proposed based on the analysis of SGMM model parameter functionalities. Our initial experiments have also verified that the proposed approach is much more effective than the traditional feature-space Maximum Likelihood Linear Regression(MLLR) on SGMM based nonnative speaker adaptation tasks.
منابع مشابه
Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model
Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSubspace Gaussian Mixture Models for Large Vocabulary Speech Recognition
Subspace Gaussian mixture model(GMM) is an alternative approach to approximate the probabilistic density function (p.d.f) of a set of independent identical distributed (i.i.d) data with prior density estimates. In this approach, the prior density of GMM parameters is estimated from a development dataset, and when predict the new enrolled data, the prior knowledge can be utilised by criteria lik...
متن کاملApproaches to Speech Recognition based on Speaker Recognition Techniques
We have experimented with approaches to speech recognition that are inspired by work from the speaker recognition community, including an approach that was used in IBM’s best speech recognition submissions in its January 2009 evaluation. Here we explain in general terms the techniques used, without the full technical details. We initially used an approach based on Maximum a Posteriori (MAP) ada...
متن کامل